Skip to content

[TG-8293] Constant propagation for empty strings and string concatenation [blocks: #4941] #4885

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

danpoe
Copy link
Contributor

@danpoe danpoe commented Jul 8, 2019

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • n/a My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@danpoe danpoe requested a review from romainbrenguier July 8, 2019 20:07
{});

symbol_table_baset &st_mutable =
const_cast<symbol_table_baset &>(ns.get_symbol_table());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💀

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫

Copy link
Contributor

@romainbrenguier romainbrenguier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review for the first commit

#include <util/mathematical_expr.h>
#include <util/string_expr.h>

#include <iostream>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not necessary

const ssa_exprt &char_array,
const array_exprt &new_char_array)
{
// assign length of string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ I find a comment like // length = new_length would be clearer

{});

symbol_table_baset &st_mutable =
const_cast<symbol_table_baset &>(ns.get_symbol_table());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫

bool constant_propagate_empty_string(
const function_application_exprt &f_l1);

void assign_constants(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 documentation is missing for these two functions

target(target)
target(target),
char_type(16),
length_type(32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these fields shouldn't be necessary and this is not language independent like symex should be.

}

bool symex_assignt::constant_propagate_empty_string(
const function_application_exprt &f_l1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ could we use a clearer name, like function_application_l1?


INVARIANT(
f_l1.arguments().size() == 2,
"empty string primitive does not take arguments");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not very clear because it asserts there are 2 arguments, but the comment says there are 0

@@ -350,6 +358,92 @@ void symex_assignt::assign_from_struct(
}
}

void symex_assignt::assign_constants(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be made clear in the name that it is for string constants

new_char_array,
{});

address_of_exprt aoe(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐤

f_l1.arguments().size() == 2,
"empty string primitive does not take arguments");

array_exprt char_array({}, char_array_type);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐤

const function_application_exprt &f_l1,
const function_application_exprt &f_l2)
{
const refined_string_exprt &s1 = to_string_expr(f_l2.arguments().at(2));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used only once, so can be inlined below. There are a lot of variables representing basically the same thing in this function (s1, s1_data_opt, s1_data) so it's bette to try to minimize that 🍋


const array_exprt &s1_data = s1_data_opt->get();

const refined_string_exprt &s2 = to_string_expr(f_l2.arguments().at(3));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍋

const constant_exprt new_char_array_length =
from_integer(new_size, length_type);

array_typet new_char_array_type(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍓

operands.insert(
operands.end(),
s2_data.operands().begin(),
s2_data.operands().end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should avoid duplication with the code from solver/strings/string_concatenation_builtin_function.h, concatenation is pretty simple and unlikely to change, but for functions like format we definitely need to share the code


array_exprt new_char_array(
std::move(operands),
new_char_array_type);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍓

@@ -27,4 +31,9 @@ bool simplify(
// this is the preferred interface
exprt simplify_expr(exprt src, const namespacet &ns);

optionalt<std::reference_wrapper<const array_exprt>>
try_get_string_data_array(
Copy link
Contributor

@romainbrenguier romainbrenguier Jul 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 needs documentation

@@ -27,4 +31,9 @@ bool simplify(
// this is the preferred interface
exprt simplify_expr(exprt src, const namespacet &ns);

optionalt<std::reference_wrapper<const array_exprt>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ optional references can be replaced by pointers

@danpoe danpoe force-pushed the feature/constant-propagate-string-concatenation branch from 8bd258e to 19f8245 Compare July 17, 2019 16:00
@codecov-io
Copy link

codecov-io commented Jul 17, 2019

Codecov Report

Merging #4885 into develop will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #4885      +/-   ##
===========================================
+ Coverage    69.27%   69.28%   +0.01%     
===========================================
  Files         1309     1309              
  Lines       108460   108515      +55     
===========================================
+ Hits         75133    75184      +51     
- Misses       33327    33331       +4
Impacted Files Coverage Δ
jbmc/src/java_bytecode/java_string_literals.cpp 94.93% <ø> (-0.42%) ⬇️
src/util/string_utils.h 100% <ø> (ø) ⬆️
src/goto-symex/goto_symex.h 92.3% <ø> (ø) ⬆️
src/goto-symex/goto_symex.cpp 99.15% <100%> (+2.38%) ⬆️
src/util/simplify_expr.cpp 87.6% <100%> (+0.33%) ⬆️
src/util/string_utils.cpp 93.22% <100%> (+1.06%) ⬆️
src/solvers/strings/array_pool.cpp 100% <100%> (ø) ⬆️
.../strings/string_constraint_generator_constants.cpp 63.33% <0%> (-16.67%) ⬇️
src/util/mathematical_expr.h 90.47% <0%> (-2.63%) ⬇️
.../strings/string_concatenation_builtin_function.cpp 84.09% <0%> (-2.28%) ⬇️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d4e0bd0...d9f75eb. Read the comment docs.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 19f8245).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/119510875

@danpoe danpoe force-pushed the feature/constant-propagate-string-concatenation branch 8 times, most recently from df38ed0 to c9c6033 Compare July 22, 2019 10:32
@danpoe danpoe changed the title Constant propagation for empty strings and string concatenation Constant propagation for empty strings and string concatenation [depends-on: #4925] Jul 22, 2019
@danpoe danpoe force-pushed the feature/constant-propagate-string-concatenation branch from c9c6033 to 686dd8a Compare July 22, 2019 10:51
@danpoe danpoe marked this pull request as ready for review July 22, 2019 10:53
@danpoe danpoe force-pushed the feature/constant-propagate-string-concatenation branch from 686dd8a to d0b5245 Compare July 22, 2019 11:08
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: 686dd8a).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/120002129
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: d0b5245).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/120004102

danpoe added a commit that referenced this pull request Jul 22, 2019
…space

Create auxiliary symbols with a different name than symbols in namespace [blocks: #4885]
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 71d11d5).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/121745220

@danpoe danpoe force-pushed the feature/constant-propagate-string-concatenation branch 4 times, most recently from f363449 to 329e603 Compare August 6, 2019 11:56
const auto &ibv_type =
to_integer_bitvector_type(to_array_type(char_array.type()).subtype());

const std::size_t n_bits = ibv_type.get_width();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ ibv_type is used only once, it can be inlined here since the variable name doesn't give any information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does provide information, and is more readable as is imo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't know what ibv meant if it wasn't followed by to_integer_bitvector_type, so at least to me it gives no information

return new_aux_symbol;
}

void goto_symext::associate_array_to_pointer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ns can be given as an argument

ns,
state.symbol_table);

ssa_exprt ssa_expr(return_symbol.symbol_expr());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const

function_application_exprt array_to_pointer_app(
function_symbol.symbol_expr(), {new_char_array, string_data});

symbolt &return_symbol = get_fresh_aux_symbol(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const

const symbolt &function_symbol =
ns.lookup(ID_cprover_associate_array_to_pointer_func);

function_application_exprt array_to_pointer_app(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const and { } initialization is prefered

aux_symbol.value == new_char_array,
"symbol shall have value derived from char array content");

address_of_exprt string_data(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const

return false;
}

void goto_symext::assign_string_constant(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this function belongs in the goto_symext class, it doesn't seem to use any of its field (apart from ns which could be given as argument). 🐙

}
}

const symbolt &goto_symext::get_new_string_data_symbol(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐙

return try_get_string_data_array(s_pointer_opt->get(), ns);
}

void goto_symext::constant_propagate_empty_string(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐙

@danpoe danpoe force-pushed the feature/constant-propagate-string-concatenation branch from 329e603 to e78d73a Compare August 6, 2019 13:03
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: e78d73a).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/122123110
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.

/// \param symex_assign: object handling symbol assignments
/// \param f_l1: application of function ID_cprover_string_empty_string_func
/// with l1 renaming applied
void constant_propagate_empty_string(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I forgot to ask for unit tests for this function and the one below (ideally also the others that are added to goto_symext)

@romainbrenguier romainbrenguier dismissed their stale review August 7, 2019 10:47

I don't like the changes, but if a code owner thinks they are fine then I won't block it

@danpoe
Copy link
Contributor Author

danpoe commented Aug 7, 2019

Regarding @romainbrenguier's comment above, we have agreed that adding unit tests for the added methods in goto_symext is not strictly necessary as the feature is well-tested by regression tests already. The "don't like the changes" refers to some of the added methods being members of goto_symext as opposed to being static functions (in the same file).

String s2 = "\\";
String s3 = s1 + s2;
assert s3.length() == 2;
assert s3.startsWith("\t");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add assert s3.endsWith("\\");

Copy link
Contributor Author

@danpoe danpoe Aug 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s1.equals(s2) is implemented in the models as s1.length() == s2.length() && s1.startsWith(s2) so here we're checking that the strings are equal. To keep the tests as simple as possible we're not using the models library. I don't think it's necessary to use endsWith() here as this is a test for concatenation, endsWith() is tested elsewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it should better be assert s3.startsWith("\t\\"); if you want to emulate equals?

StringBuilder sb = new StringBuilder();
String s = sb.toString();
assert s.isEmpty();
assert s.startsWith("");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that always true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it should be.

CORE
Main.class
--function Main.test --property "java::Main.test:()V.assertion.1" --property "java::Main.test:()V.assertion.2"
^Generated [0-9]+ VCC\(s\), 0 remaining after simplification$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be exactly 1 VCC if you use --property?

Copy link
Contributor Author

@danpoe danpoe Aug 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--property is passed in twice here, so yes I think the value should be 2. I'll change it to use the precise value.

Copy link
Contributor Author

@danpoe danpoe Aug 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it turns out that jbmc translates those assertions to something like:

if(length_return_value != 6)
{
  assert(false);
}

So if length_return_value is 6, we don't symex the assertion and thus get 0 VCCs before simplification. This is probably not stable behaviour though so I'll leave it as is. The important thing is that after simplification we have 0 remaining VCCs.

lhs, expr_skeletont{}, rhs, lhs_if_then_else_conditions);
}
}

bool goto_symext::constant_propagate_assignment_with_side_effects(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but might be needed in the future

@danpoe, are there any operations where this is actually used?

}

optionalt<std::reference_wrapper<const array_exprt>>
goto_symext::try_evaluate_constant_string(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐙

Copy link
Contributor Author

@danpoe danpoe Aug 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer those methods to live in goto_symext for now as opposed to being static. The plan is to eventually create a goto_symex_constant_propagationt member of goto_symext which handles the constant propagation.

@danpoe danpoe force-pushed the feature/constant-propagate-string-concatenation branch from e78d73a to 4fecf4e Compare August 8, 2019 10:11
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 4fecf4e).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/122424687

@danpoe danpoe force-pushed the feature/constant-propagate-string-concatenation branch from 4fecf4e to d9f75eb Compare August 8, 2019 12:26
danpoe and others added 9 commits August 8, 2019 13:26
The function is generally useful for obtaining strings that contain only
alphanumeric characters.
This is so the function can be called outside of simplify_expr.cpp.
Take the content field of a refined string expression instead of taking the
whole refined string expression. This will allow it to be used from the string
mutators constant propagation code in goto_symex.cpp.
….cpp

Previously the string solver returned an error when constant propagation of
strings was used as `array_poolt::insert()` might be called for array
expressions for which there is already a mapping in the `length_of_array` map.
This implements constant propagation of ID_cprover_string_concat_func and
ID_cprover_string_empty_string_func which are the primitives used by
StringBuilder.append(), which in turn is used to implement String concatenation
using +.
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: d9f75eb).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/122439299
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.

@danpoe danpoe merged commit aa5fcda into diffblue:develop Aug 8, 2019
@danpoe danpoe deleted the feature/constant-propagate-string-concatenation branch June 2, 2020 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants